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Abstract 

We present an approximation scheme for optimizing certain Quadratic Integer Program- 
ming problems with positive semidefinite objective functions and global linear constraints. 
This framework includes well known graph problems such as Minimum graph bisection, Edge 
expansion, Uniform sparsest cut, and Small Set expansion, as well as the Unique Games prob- 
lem. These problems are notorious for the existence of huge gaps between the known algorith- 
mic results and NP-hardness results. Our algorithm is based on rounding semidefinite pro- 
grams from the Lasserre hierarchy, and the analysis uses bounds for low-rank approximations 
of a matrix in Frobenius norm using columns of the matrix. 

For all the above graph problems, we give an algorithm running in time n ' r / £ ' with ap- 
proximation ratio min 1 |-i 1 £ A p where A r is the r'th smallest eigenvalue of the normalized graph 
Laplacian C. In the case of graph bisection and small set expansion, the number of vertices 
in the cut is within lower-order terms of the stipulated bound. Our results imply (1 + O(e)) 
factor approximation in time n 0{ - r / £ ' where r* is the number of eigenvalues of C smaller than 
1 — e. This perhaps gives some indication as to why even showing mere APX-hardness for 
these problems has been elusive, since the reduction must produce graphs with a slowly grow- 
ing spectrum (and classes like planar graphs which are known to have such a spectral property 
often admit good algorithms owing to their nice structure). 

For Unique Games, we give a factor ( 1 + ) approximation for minimizing the number 
of unsatisfied constraints in n°( r / £ ' time. This improves an earlier bound for solving Unique 
Games on expanders, and also shows that Lasserre SDPs are powerful enough to solve well- 
known integrality gap instances for the basic SDP 

We also give an algorithm for independent sets in graphs that performs well when the 
Laplacian does not have too many eigenvalues bigger than 1 + o(l). 
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1 Introduction 



The theory of approximation algorithms has made major strides in the last two decades, pinning 
down, for many basic optimization problems, the exact (or asymptotic) threshold up to which 
efficient approximation is possible. Some notorious problems, however, have withstood this wave 
of progress; for these problems the best known algorithms deliver super-constant approximation 
ratios, whereas NP-hardness results do not even rule out say a factor 1.1 (or sometimes even 
a factor (1 + e) for any constant e > 0) approximation algorithm. Examples of such problems 
include graph partitioning problems such as minimum bisection, uniform sparsest cut, and small- 
set expansion; finding a dense subgraph induced on k vertices; minimum linear arrangement; and 
constraint satisfaction problems such as minimum CNF deletion or Unique Games. 

There has been evidence of three distinct flavors for the hardness of these problems: (i) Rul- 
ing out a polynomial time approximation scheme (PTAS) assuming that NP <£_ f\>o BPTIME(2 n ) 
via quasi-random PCPs [Kho06, AMS07]; (ii) Inapproximability results within some constant fac- 
tor assuming average-case hardness of refuting random 3SAT instances [Fei02]; and (iii) Inap- 
proximability within super-constant factors under a strong conjecture on the intractability of the 
small-set expansion (SSE) problem [RST10]. While (iii) gives the strongest hardness results, it is 
conditioned on the conjectured hardness of SSE [RS10], an assumption that implies the Unique 
Games conjecture, and arguably does not yet have as much evidence in its support as the com- 
plexity assumptions made in (i) or (ii). 

In this work, we give a unified algorithm, based on powerful semidefinite programs from 
the Lasserre hierarchy, for several of these problems, and a broader class of quadratic integer pro- 
gramming problems with linear constraints (more details are in Section 1.1 below). Our algorithms 
deliver a good approximation ratio if the eigenvalues of the Laplacian of the underlying graph in- 
crease at a reasonable rate. In particular, for all the above graph partitioning problems, we get a 
(1 + e)j min{A r , 1} approximation factor in n°^ r > time, where X r is the r'th smallest eigenvalue 
of the normalized Laplacian (which has eigenvalues in the interval [0, 2]). Note that if A r ^ 1 — s, 
then we get a (1 + 0(e)) approximation ratio. 

Perspective. The direct algorithmic interpretation of our results is simply that one can probably 
get good approximations for graphs that are pretty "weak-expanders," in that we only require 
lower bounds on higher eigenvalues rather than on A2 as in the case of expanders. In terms of our 
broader understanding of the complexity of approximating these problems, our results perhaps 
point to why even showing APX-hardness for these problems has been difficult, as the reduction 
must produce graphs with a very slowly growing spectrum, with many (n™ 1 ', or even n 1 " ^ 1 ) for 
near-linear time reductions) small eigenvalues. Trivial examples of such graphs are the disjoint 
union of many small components (taking the union of r components ensures A r = 0), but these 
are of course easily handled by working on each component separately. We note that Laplacians 
of planar graphs, bounded genus graphs, and graphs excluding fixed minors, have many small 
eigenvalues [KLPT10], but these classes are often easier to handle algorithmically due to their 
rich structure — for example, conductance and edge expansion problems are polynomial time 
solvable on planar graphs [PP93]. Also, the recent result of [ABS10] shows that if A r = o(l) for 
some r = n n ^\ then the graph must have an n 1 "^ 1 ) sized subset with very few edges leaving 
it. Speculating somewhat boldly, may be these results suggest that graphs with too many small 
eigenvalues are also typically not hard instances for these problems. 
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Our results also give some explanation for our inability so far to show integrality gaps for 
even 4 rounds of the Lasserre hierarchy for problems which we only know to be hard assuming 
the Unique Games conjecture (UGC). In fact, it is entirely consistent with current knowledge that 
just O(l) rounds of the Lasserre hierarchy gives an improvement over the 0.878 performance ratio 
of the Goemans- Williamson algorithm for Max Cut, and refutes the UGC! 

1.1 Summary of results 

Let us now state our specific results informally. 

Graph partitioning. We begin with results for certain cut/ graph partitioning problems. For 
simplicity, we state the results for unweighted graphs — the body of the paper handles weighted 
graphs. Below A p denotes the p'th smallest eigenvalue of the normalized Laplacian £ of the graph 
G, defined as C = D~ l l 2 {I — A)D~ 1 / 2 where A is the adjacency matrix and D is a diagonal matrix 
with node degrees on the diagonal. (In the stated approximation ratios, A r (resp. 2 — A n _ r ) should 
be understood as min{A r , 1} (resp. min{2 — A n _ r , 1}), but we don't make this explicit to avoid 
notational clutter.) The algorithm's running time is each case. This runtime arises due 

to solving the standard semidefinite programs (SDP) lifted with 0(r/e 2 ) rounds of the Lasserre 
hierarchy. Our results are shown via an efficient rounding algorithm whose runtime is n°^; the 
exponential dependence on r is thus limited to solving the SDP. 

• MAXIMUM Cut AND MINIMUM UNCUT: Given a graph G on n vertices with a partition 
leaving at most b many edges uncut, we can find a partition that leaves at most 2 -\ S b 
many edges uncut. (We can also get an approximation guarantee of (1 + 2 j^) for Minimum 
Uncut as a special case of our result for Unique Games.) 

• MINIMUM (MAXIMUM) BISECTION: Given a graph G on n vertices with a bisection (partition 
into two equal parts) cutting (uncutting) at most b edges, we can find a near-bisection, with 
each side having ^ ± £ (^/n) vertices, that cuts at most ^0b (uncuts at most 2 -\ S _ ^) edges 
respectively. 

• SMALL-SET EXPANSION (SSE): Given a graph G on n vertices with a set of volume fi with 
at most b edges leaving it, we can find a set U of volume fi ± £ (\/d max fi) with at most k^b 
edges leaving U . We can also find such a set with volume ± e), which will be a better 
guarantee for highly irregular graphs. 

• Various graph partitioning problems which involve minimizing ratio of cut size with size or 
volume of partitions. For each problem below, we can find a non-empty set U C V, whose 
value is at most ^^OPT. 

- UNIFORM Sparsest Cut: 4>g(U) — defined as the ratio of the number of edges in the 
cut (U,V\U) divided by | U\ \V \ U\. 

- EDGE EXPANSION: hc(U) — defined as the ratio of the number of edges leaving U to 
the number of nodes in U, where U is the smaller side of the cut. 

- NORMALIZED CUT: ncut ( 3(L r ) — defined as the ratio of the number of edges in the cut 
(U,V\U) divided by the product of the volumes of U and V\U. 
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- CONDUCTANCE: conductancec^) — defined as the fraction of edges incident on U 
that leave U where U is the side of the cut with smaller volume. 

In each case, we can also handle boundary conditions stipulating that U must contain some subset 
F of nodes, and avoid some other disjoint subset B of nodes. This feature is used for our results 
on the above ratio-minimization objectives as well as finding a set of volume ± e) for SSE. 

We can give similar guarantees for the A; -way partitioning versions of these problems (when 
the version makes sense). For example, for the fc-way section problem of splitting the vertices into 
k equal parts to minimize the number of cut edges, we can give a (1 + e)/X r approximation in 
n O(kr/e 2 ) ume ^ w jth 0(^n \og{k/e)) error in the size of each part. 

Remark 1 (Subspace enumeration). We note that for conductance (and related problems with quo- 
tient objectives mentioned above), it is possible to get a 0(1/ X r ) approximation in time by 
searching for a good cut in the r -dimensional eigenspace corresponding to the r smallest eigen- 
values (we thank David Steurer for pointing out that this is implicit in the subspace enumeration 
results of [ABS10]). It is not clear, however, if such methods can give a (1+e) / A r type ratio. Further, 
this method does not apply in the presence of a balance requirement such as Minimum Bisection, 
as it could violate the balance condition by an Q(n) amount, or for Unique Games (where we do 
not know how to control the spectrum of the lifted graph). □ 



PSD Quadratic Integer Programs. In addition to the above cut problems, our method ap- 
plies more abstractly to the class of minimization quadratic integer programs (QIP) with positive 
semidefinite (PSD) cost functions and arbitrary linear constraints. 

• QIP with PSD COSTS: Given a PSD matrix A G R(Vx[fc])x(Vx[fc])^ consi d er the problem of 
finding x G {0, lj^M minimizing x T Ax subject to: (i) exactly one of {xu(i)}i£[k] equals 1 
for each u, and (ii) the linear constraints Bx ^ c. We find such an x with x^Lx ^ m i n {i^ £ (^)} 
where A = diag(A) -1 / 2 • A ■ diag(A) -1 / 2 . 



Unique Games. We next state our result for Unique Games. This is not a direct application of 
the result for QIP; see Section 1.2 for details on the difficulties. 



• UNIQUE GAMES: Given a Unique Games instance with constraint graph G = (V, E), label 
set [k], and bijective constraints 7r e for each edge, if the optimum assignment a : V — > [k] 
fails to satisfy rj of the constraints, we can find an assignment that fails to satisfy at most 

77 (l + ^x^j of the constraints. 

In this case, we are only able to get a weaker « 1 + 2/ A r approximation factor, which is always 
larger than 2. In this context, it is interesting to note that minimizing the number of unsatisfied 
constraints in Unique Games is known to be APX-hard; for example, the known NP-hardness for 
approximating Max Cut [HasOl, TSSW00] implies a factor (5/4 — e) hardness for this problem (and 
indeed for the special case of Minimum Uncut). 

Remark 2 (UG on expanders). Arora et al [AKK+08] showed that Unique Games is easy on ex- 
panders, and gave an O( log ^ 0PT ^ ) approximation to the problem of minimizing the number of 
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unsatisfied constraints, where OPT is the fraction of unsatisfied constraints in the optimal solu- 
tion. For the subclass of "linear" Unique Games, they achieved an approximation ratio of 0(1/ \2) 
without any dependence on OPT. A factor 0(1/ X2) approximation ratio was achieved for general 
Unique Games instances by Makarychev and Makarychev [MM10] (assuming A2 is large enough, 
they also get a 0(1/ he) approximation where he is the Cheeger constant). Our result achieves an 
approximation factor of 0(1/ X r ), if one is allowed time. 

For instances of TMAX2LIN, the paper [AKK + 08] also gives an time algorithm that satis- 
fies all but a fraction O(0PT/ z r (G)) of constraints, where z r (G) is the value of the r-round Lasserre 
SDP relaxation of Sparsest Cut on G. For r = 1, Z\(G) = A2. But the growth rate of z r (G), eg. its 
relation to the Laplacian spectrum, was not known. □ 

Remark 3 (SDP gap instances). Our algorithm also shows that the Khot-Vishnoi UG gap instance 
for the basic SDP [KV05], has 0(1) integrality gap for the lifted SDP corresponding to poly(log n) 
rounds of Lasserre hierarchy. In particular, these instances admit quasi-polynomial time constant 
factor approximations. This latter result is already known and was shown by Kolla [KollO] using 
spectral techniques. Our result shows that strong enough SDPs also suffice to tackle these in- 
stances. In a similar vein, applying the ABS graph decomposition [ABS10] to split the graph into 
components with at most n e small eigenvalues while cutting very few edges, one also gets that 
n e n(1) rounc i s f the Lasserre hierarchy suffice to well-approximate Unique Games on instances 
with at most e fraction unsatisfied constraints. □ 



Independent Set in graphs. We also give a rounding algorithm for the natural Lasserre SDP for 
independent set in graphs. Here, our result gives an algorithm running in n°^ r l e ) time algorithm 
that finds an independent set of size ps — \ — ~r where \ n - r > 1 is the r'th largest eigenvalue 
of the graph's normalized Laplacian. Thus even exact independent set is easy for graphs for which 
the number of eigenvalues greater than w 1 + — is small. 



1.2 Our Techniques 

Our results follow a unified approach, based on a SDP relaxation of the underlying integer pro- 
gram. The SDP is chosen from the Lasserre hierarchy [Las02], and its solution has vectors xt(o) 
corresponding to local assignments to every subset T C V of at most r' vertices. (Such an SDP is 
said to belong to r' rounds of the Lasserre hierarchy.) The vectors satisfy dot product constraints 
corresponding to consistency of pairs of these local assignments. (See Section 2 for a formal de- 
scription.) 

Given an optimal solution to the Lasserre SDP, we give a rounding method based on local 
propagation, similar to the rounding algorithm for Unique Games on expanders in [AKK + 08]. We 
first find an appropriate subset S of r' nodes (called the seed nodes). One could simply try all such 
subsets in rf time, though there is an 0(n 5 ) time algorithm to locate the set S as well. Then for 
each assignment / to nodes in S, we randomly extend the assignment to all nodes by assigning, 
for each u € V\S independently, a random value from it's marginal distribution based on x su{u} 
conditioned on the assignment / to S. 

After arithmetizing the performance of the rounding algorithm, and making a simple but cru- 
cial observation that lets us pass from higher order Lasserre vectors to vectors corresponding 
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to single vertices, the core step in the analysis is the following: Given vectors {X v G R T }„ e y 
and an upper bound on a positive semidefinite (PSD) quadratic form v&v L UV (X U , X v ) = 
Jr(X T XL) ^ 7], place an upper bound on the sum of the squared distance of X u from the span 
of {X s } seS , i.e., the quantity J2 U \\ x s X u\? = Tr(X T X^X). (Here X G R TxV is the matrix with 
columns {X v : v G V}.) 

We relate the above question to the problem of column-selection for low-rank approximations 
to a matrix, studied in many recent works [DV06, DR10, BDMI11, GS11]. It is known by the 
recent works [BDMI11, GS11] 1 that one can pick r/e columns 5 such that Tr(X T X$X) is at most 
1/(1 — e) times the error of the best rank-r approximation to X in Frobenius norm, which equals 
J2i> r a i w here the cr/s are the eigenvalues of X T X in decreasing order. Combining this with the 

upper bound Tr(X T XL) sC rj, we deduce an approximation ratio of ^1 + j^-^j for our algorithm. 
Also, the independent rounding of each u implies, by standard Chernoff-bounds, that any linear 
constraint (such as a balance condition) is met up to lower order deviations. 

Note that the above gives an approximation ratio « 1 + 1/A r , which always exceeds 2. To get 
our improved (1 + e)/X r guarantee, we need a more refined analysis, based on iterated application 
of column selection along with some other ideas. 

For Unique Games, a direct application of our framework for quadratic IPs would require 
relating the spectrum of the constraint graph G of the Unique Games instance to that of the lifted 
graph G. There are such results known for random lifts, for instance [OHIO]; saying something 
in the case of arbitrary lifts, however, seems very difficult. 2 We therefore resort to an indirect 
approach, based on embedding the set of k vectors {xu{i)}ie[k} f° r a vertex into a single vector 
X u with some nice distance preserving properties that enables us to relate quadratic forms on the 
lifted graph to a proxy form on the base constraint graph. This idea was also used in [AKK + 08] for 
the analysis of their algorithm on expanders, where they used an embedding based on non-linear 
tensoring. In our case, we need the embedding to also preserve distances from certain higher- 
dimensional subspaces (in addition to preserving pairwise distances); this favors an embedding 
that is as "linear" as possible, which we obtain by passing to a tensor product space. 

1.3 Related work on Lasserre SDPs in approximation 

The Lasserre SDPs seem very powerful, and as mentioned earlier, for problems shown to be hard 
assuming the UGC (such as beating Goemans-Williamson for Max Cut), integrality gaps are not 
known even for a small constant number of rounds. A gap instance for Unique Games is known 
if the Lasserre constraints are only approximately satisfied [KPS10]. It is interesting to contrast this 
with our positive result. The error needed in the constraints for the construction in [KPS10] is 
r/(log log n) c for some c < 1, where n is the number of vertices and r the number of rounds. Our 
analysis requires the Lasserre consistency constraints are met exactly. In Appendix A, we present 
an algorithm that produces such valid Lasserre SDP solutions in time (kn)° ^ 0(log(l / Eo)) with 
an additive error of £o in linear constraints, and an objective value at most Eq more than optimal. 

Strong Lasserre integrality gaps have been constructed for certain approximation problems 

1 In fact our work [GS11] was motivated by the analysis in this paper. 

2 It is known that A r kn s (£(G)) > SX r (C(G)) [ABS10], but this large multiplicative n 5 slack makes this ineffective for 
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that are known to be NP-hard. Schoenebeck proved a strong negative result that even U(n) rounds 
of the Lasserre hierarchy has an integrality gap m 2 for Max 3-LIN [Sch08]. Via reductions from 
this result, Tulsiani showed gap instances for Max fc-CSP (for J7(n) rounds), and instances with 
n l ~°^ gap for « 2^ logn rounds for the Independent Set and Chromatic Numbers [Tul09]. 

In terms of algorithmic results, even few rounds of Lasserre is already as strong as the SDPs 
used to obtain the best known approximation algorithms for several problems — for example, 
3 rounds of Lasserre is enough to capture the ARV SDP relaxation for Sparsest Cut [ARV09], 
and Chlamtac used the third level of the Lasserre hierarchy to get improvements for coloring 
3-colorable graphs [Chl07]. In terms of positive results that use a larger (growing) number of 
Lasserre rounds, we are aware of only two results. Chlamtac and Singh used 0{1/^ 2 ) rounds of 
Lasserre hierarchy to find an independent set of size r2(n 7 / 8 ) in 3-uniform hypergraphs with an 
independent set of size jn [CS08]. Karlin, Mathieu, and Nguyen show that 1/e rounds of Lasserre 
SDP gives a (1 + e) approximation to the Knapsack problem [KMN10]. 

However, there are mixed hierarchies, which are weaker than Lasserre and based on combin- 
ing an LP characterized by local distributions (from the Sherali-Adams hierarchy) with a sim- 
ple SDP, that have been used for several approximation algorithms. For instance, for the above- 
mentioned result on independent sets in 3-uniform hypergraphs, an n n ^ ' sized independent set 
can be found with 0(l/7 2 ) levels from the mixed hierarchy. Raghavendra's result states that for 
every constraint satisfaction problem, assuming the Unique Games conjecture, the best approxi- 
mation ratio is achieved by a small number of levels from the mixed hierarchy [Rag08]. For further 
information and references on the use of SDP and LP hierarchies in approximation algorithms, we 
point the reader to the excellent book chapter [CT11]. 

In an independent work, Barak, Raghavendra, and Steurer [BRS11] consider the above-mentioned 
mixed hierarchy, and extend the local propagation rounding of [AKK+08] to these SDPs in a man- 
ner similar to our work. Their analysis methods are rather different from ours. Instead of column- 
based low-rank matrix approximation, they use the graph spectrum to infer global correlation 
amongst the SDP vectors from local correlation, and use it to iteratively to argue that a random 
seed set works well in the rounding. Their main result is an additive approximation for Max 2-CSPs. 
Translating to the terminology used in this paper, given a 2CSP instance over domain size k with 
optimal value (fraction of satisfied constraints) equal to v, they give an algorithm to find an as- 
signment with value v — O (k^l — A r ) based on r' S> kr rounds of the mixed hierarchy. (Here 
A r is the r'th smallest eigenvalue of the normalized Laplacian of the constraint graph; note though 
that A r needs to be fairly close to 1 for the bound to kick in.) For the special case of Unique Games, 
they get the better performance of v — O ( \f\ — A r ) which doesn't degrade with k, and also a factor 
0(1/A r ) approximation for minimizing the number of unsatisfied constraints in time exponential 
in k. 

For 2CSPs, our results only apply to a restricted class (corresponding to PSD quadratic forms), 
but we get approximation-scheme style multiplicative guarantees for the harder minimization objec- 
tive, and can handle global linear constraints. (Also, for Unique Games, our algorithm has running 
time polynomial in the number of labels k. In terms of r, the runtime in [BRS11] has a better 2°^ 
type dependence instead of our bounds.) Our approach enables us to get approximation- 
scheme style guarantees for several notorious graph partitioning problems that have eluded even 
APX-hardness. 
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1.4 Organization 

We begin with the definition of Lasserre SDPs we will use in Section 2. To illustrate the main 
ideas in our work, we present them in a self-contained way for a simplified setting in Section 3, by 
developing an algorithm for the Minimum Bisection problem. 

The rest of our results are proved in remaining Sections: quadratic integer programming in 
Section 4; graph partitioning problems such as conductance, Uniform Sparsest Cut, and SSE (on 
general, non-regular, weighted graphs) in Section 5; Unique Games in Section 6; and finally, inde- 
pendent sets in graphs in Section 8. 

The main technical theorem about rounding that is used to analyze our algorithm for quadratic 
integer programming is proved in Section 7. We discuss the accuracy needed (by our rounding) 
in the solution to the Lasserre SDP in Appendix A. 

2 Lasserre hierarchy of semidefinite programs 

We present the formal definitions of the Lasserre family of SDP relaxations [Las02], tailored to the 
setting of the problems we are interested in, where the goal is to assign to each vertex/ variable 
from a set V a label from [k] = {1,2,..., k}. 

Definition 1 (Lasserre vector set). Given a set of variables V and a set [k] = {1,2, ... ,k} of labels, 
and an integer r ^ 0, a vector set x is said to satisfy r -levels of Lasserre hierarchy constraints on k labels, 
denoted 

x G Lasserre^ (V x [k]) , 

if it satisfies the following conditions: 

1. For each set S G (< r +i)/ there exists a function xs ■ [k] s — > M T that associates a vector of some finite 
dimension T with each possible labeling of S. We use xs(f) to denote the vector associated with the 
labeling f G [k] s . For singletons u £V,we will use x u (i) and x u (i u )for i G [k] interchangeably. 

For f G [k] s and v G S, we use f(v) as the label v receives from f. Also given sets S with labeling 
f G [k] s and T with labeling g G [k] T such that f and g agree on S (~)T, we use f o g to denote the 
labeling of S U T consistent with f and g: Ifu G S, (/ o g)(u) = f(u) and vice versa. 

2. ||x || 2 = l. 

3. {xs(f),XT(g)) = if there exists u G S n T such that f(u) ^ g{u). 

4. (x s (f),x T (g)} = (x A (f), x B (g')) ifSUT = AuBandfog = f'og'. 

5. For any u G V, Ej 6 [fe] IKOOII 2 = N II 2 - 

6. (implied by above constraints) For any S G u G S and f G [k] s ^ u \ Y^ g ^[k] u x s{f <?) = 
xs\{u}{f)- 

We will use X(i) to denote a matrix of size T x n, X(i) G M Txl/ whose columns are the vectors {x u (i)} u& v 
We now add linear constraints to the SDP formulation. 



9 



Definition 2 (Linear constraints in Lasserre SDPs). Given a matrix B = [b± . . . bg] G ~R( y x i k \) xe and a 

vector c = (ci, . . . , q) t G R^, x g Lasserre^ x [k]), is said to satisfy linear constraints {(bi, c«)}f =1 
if the following holds for all i G 

For a/Z swfeers S G ( J r ) and / G [k] v , 

( x s(f),x u (g))bi(u, g) sC a(x s (f), x 9 ) , 

uev,ge[fc] u 

which is equivalent to 

£ H^uM(/°5)l| 2 ^(n,5)^Q||x 5 (/)|| 2 . 

ii£V,ge[fc] u 

We denote the set of such x as x G Lasserre^' -* (1/ x [fc], B^ c ). 

Remark 4 (Convenient matrix notation). One common expression we will use throughout this 
paper is the following. For matrices X G R TxV and M G M. VxV : 

Jr(X T XM)= £ M U ,„(X U ,X„). 

Note that if M is positive semidefinite (denoted M h 0), then Tr(X T XM) ^ 0. 3 
Also, if L is Laplacian matrix of an undirected graph G = (V,E), we have 

Jr(X T XL)= £ ||X U -XJ 2 

e={u,n}S-B 

where X u denotes the column of X corresponding to u G V. □ 

The analysis of our rounding algorithm will involve projections on certain subspaces, which we 
define next. 

Definition 3 (Projection operators). Given x G Lasserre^ (y x [k]), we define II : ( < ^1 1 ) — > M TxT 
as the projection matrix onto the span of{xs(f)} fe[k] s f or S^ ven & : 

fe[k]s 

(Here xs(f) is the unit vector in the direction ofxs(f) ifxs(f) is nonzero, and otherwise.) 

Similarly we define P : ( <r+1 ) — > M TxT as the matrix corresponding to projection onto the span of 

{xv{f)}v&SJe[k}- Ps = J2vesje[k] x v(f) • Xv{f) ■ 

We will denote by lig =I—Iis and Pg = I—Ps the projection matrices onto the respective orthogonal 
complements, where I denotes the identity matrix of appropriate dimension. 

Remark 5 (Errors in Lasserre solution). In Appendix A, we present a simple algorithm which finds 
an SDP solution that satisfies all Lasserre consistency constraints exactly, with a small additive 
error in the linear constraints and the objective value. □ 

3 The use of this inequality in various places is the reason why our analysis only works for minimizing PSD quadratic 
forms. 
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3 Case Study: Approximating Minimum Bisection 



All our algorithmic results follow a unified method (except small set expansion on irregular 
graphs and unique games, both of which we treat separately). In this section, we will illustrate 
the main ideas involved in our work in a simplified setting, by working out progressively bet- 
ter approximation ratios for the following basic, well-studied problem: Given as input a graph 
G = (V, E) and an integer size parameter fj,, find a subset U C V with \U\ = \i that minimizes the 
number of edges between U and V\U, denoted Fq(U). The special case when \i= |V|/2 and we 
want to partition the vertex set into two equal parts is the minimum bisection problem. We will 
loosely refer to the general fi case also as minimum bisection. 4 

For simplicity we will assume G is unweighted and d-regular, however all our results given 
in Section 5 are for any weighted undirected graph G. We can formulate this problem as a binary 
integer programming problem as follows: 

mm Y, (£«(!)- £«(1)) 2 , (1) 

e={u,v}&E 

subject to Y = M; Vu,x u (1) +x u {2) = 1; and x £ {0,l} Vx[2] . 

u 

If we let L be the Laplacian matrix for G, we can rewrite the objective as r\ = x(l) T Lx(l). We will 
denote by C = \L the normalized Laplacian of G. 

Note that the above is a quadratic integer programming (QIP) problem with linear constraints. 
The somewhat peculiar formulation is in anticipation of the Lasserre semidefinite programming 
relaxation for this problem, which we describe below. 

3.1 Lasserre relaxation for Minimum Bisection 

Let b be the vector on V x [2] with b v {\) = 1 and b v (2) = for every v G V . For an integer r' ^ 0, the 
r'-round Lasserre SDP for Minimum Bisection consists of finding x G Lasserre^ 7 \V x [k], 6 =M ) 
that minimizes the objective function 

]T \\x u (l) - x v (l)\\ 2 . (2) 
e={u,v}eE(G) 

It is easy to see that this is indeed a relaxation of our original QIP formulation (1). 

3.2 Main theorem on rounding 

Let x be an (optimal) solution to the above r'-round Lasserre SDP. We will always use rj in this 
section to refer to the objective value of x, i.e., rj = Yl e ={u v}eE(G) — x u(l)l| 2 - 

Our ultimate goal in this section is to give an algorithm to round the SDP solution x to a good 
cut U of size very close to [l, and prove the below theorem. 

4 We will be interested in finding a set of size /i± o(/x), so we avoid the terminology Balanced Separator which typically 
refers to the variant where fi(n) slack is allowed in the set size. 
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Theorem 4. For all r ^ 1 and e > 0, there exists r' = O (-p), such that given x G Lasserre^'^y x 
[k], b =fJl ) with objective value (2) equal to n, one can find in randomized time, a set U satisfying 
the following two properties w.h.p: 

2. n(l - o(l)) = p, — (V^log(l/e)) < |CT| < m + O (VA*log(l/e)) = /x(l + o(l)). 

Since one can solve the Lasserre relaxation in n°( r ') time using Theorem 39, we get the result 
claimed in the introduction: an n ^/^ time factor (1 + e)/ min{A r , 1} approximation algorithm; 
the formal theorem, for general (non-regular, weighted) graphs appears as Corollary 15 in Sec- 
tion 5. Note that if t = argmin r {r | A r (£) ^ 1 — s/2}, then this gives an n° £ W time algorithm for 
approximating minimum bisection to within a (1 + e) factor, provided we allow 0(^/n) imbalance. 



3.3 The rounding algorithm 

Recall that the solution x G Lasserre^ r \V x [k], 6 =M ) contains a vector £t(/) for each T G {< r ') 
and every possible labeling of T, / G [2] T of T. Our approach to round x to a solution a; to the 
integer program (1) is similar to the label propagation approach used in [AKK+08]. 

Consider fixing a set of r' nodes, S G (^), and assigning a label f(s) to every s G 5 by choosing 
/ G [2] s with probability ||x5(/)|| 2 . (The best choice of S can be found by brute-forcing over all 
of (^), since solving the Lasserre SDP takes n°^ r "> time anyway. But there is also a faster method 
to find a good S, as mentioned in Theorem 7.) Conditional on choosing a specific labeling / to 5, 
we propagate the labeling to other nodes as follows: Independently for each u G V, choose i G [2] 
and assign x u (i) <— 1 with probability 



Pr[7 ,~_,,_ \\*su{u}(f°rW _ (x s (f),x u (i)) 
[uW J " IM/)II 2 " IM/)II ' 

Observe that if u G S, label of u will always be f{u). Finally, output U = {u \ x u (l) = 1} as the 
cut. Below lis denotes the projection matrix from Definition 3. 

Lemma 5. For the above rounding procedure, the size of the cut produced ro(U) satisfies 

E[r G (u)] = v + ]T (n^(i).n^(i))). (3) 

Proof. Note that for u / v, and i, j G [2], 



Pr Mi) = i a i vU ) = i] -Ell^(/)ll 2 <I 1| (/) ;nif )>(X 1| (/> ;n! j)> 

\\xs{f)\\ \\xs(f)\\ 

= ^2( x s(.f),x u {i)}(x s (f),x v (j)}. 



Since {xs(f)}f is an orthonormal basis, the above expression can be written as the inner product 
of projections of x u (i) and x v (j) onto the span of {xs(f)} /e[2] 5 ' which we denote by rig. Let us now 
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calculate the expected number Tg{U) of edges cut by this rounding. It is slightly more convenient 
to treat edges e = {u, v} as two directed edges (u, v) and (v, u), and count directed edges (it, v) 
with u G U and v <E V \ U in the cut. Therefore, 

e\t g (u)]= Y <n s s«(i),n s x 1 ,(2))= Y {^sx v (i) t n s (x 9 -x v {i))) 

(u,v)eE (u,v)eE 
= Y (n s x„(l),nsac0)-(n s a; u (l),n5ac(l))) (4) 

(u,v)EE 

By using the fact that (U s x u (l), n s x ) = (x u (l), II5Z0) = (x u (l),x$) = ||x u (l)|| 2 , we can rewrite 
Equation (4) in the following way: 

= Y \\xu(l)f - {U S x u {l),U s x v (l))) 

(u,v)eE 

= Y \\x u (l)\\ 2 -(xu(l),x v (l)) + {U^x u (l),Ujx v (l)) 

(u,v)eE 

= v+ Y (n^ u (i),n^„(i)). □ 

(u,v)EE 

Note that the matrix II5 depends on vectors xs(f) which are hard to control because we do not 
have any constraint relating xs(f) to a known matrix. The main driving force behind all our re- 
sults is the following fact, which follows since given any u G S and i G [2], x u (i) = Ylf-f(u)=i x s(f) 
by Lasserre constraints. 

Observation 6. For all S G (J,), 

span ({xs(f)}f e [2]3j 2 span ({x u (i)} ueS ,ie[2}) ■ 
Equivalently for Ps being the projection matrix onto span of {x u (i)} ue s,ieW Ps ^ IT5. 

Thus we will try to upper bound the term in Equation (3) by replacing ILj with Pg, but we 
cannot directly perform this switch: (PgX u (i), PgX v (J)) might be negative while HgX u (i) = 0. 

3.4 Factor 1 + j- approximation of cut value 

Our first bound is by directly upper bounding Equation (3) in terms of ||II^x M (i)|| 2 ^ ||P5-x u (i)|| 2 . 
Using Cauchy-Schwarz and Arithmetic-Geometric Mean inequalities, (3) implies that the expected 
number of edges cut is upper bounded by 

n + \ Y H n K(i)ll 2 + lin^,(i)|| 2 = v + dY lin^(i)|| 2 ^ v + dY II^U)II 2 • (5) 

e=(u,v)eE u u 

Now define X u = x u (l), and let X G R TxV be the matrix with columns X u . By (2), we have the 
objective value rj = Tr(X T XL). Let Xj? = J2 u &s X U X U be the projection matrix onto the span of 
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{X u } ue s- Since this set is a subset of {x u (i)} u es,ie[2]> we have < Pg. Therefore, we can bound 
(5) further as 

E [number of edges cut] ^r] + dJ2\\X^X u \\ 2 = r] + d-Jr{X T X^X) . (6) 

u 

To get the best upper bound, we want to pick 5 G ( ,) to minimize YlueV W-^-s -^u II 2 - ^ * s a we ^ 
known fact that among flZZ projection matrices M of rank r' (not necessarily restricted to projection 
onto columns of X), the minimum value of Yl u \\M ± X U \\ 2 = Tr(X T M X) is achieved by matrix 
M projecting onto the space of the largest r' singular vectors of X. Further, this minimum value 
equals Yli> r '+l a i where <7j = (Ji{X) denotes the squared i th largest singular value of X (equiva- 
lently <Ji(X) is the i th largest eigenvalue of X T X). Hence Tr(X T XgX) ^ J2i> r '+i a i ^ or ever y 
choice of S. The following theorem from [GS11] shows the existence of S which comes close to 
this lower bound: 

Theorem 7. [GS11] For every real matrix X with column set V, and positive integers r ;C r', we have 

8AX) 4 min Tr(X^X) £ £ 



In particular, for all e G (0, 1), <5 r / e ^ ) ■ Further one can find a set S G ( r .,) achieving the 

claimed bounds in deterministic 0(rn 4 ) fz'wze. 



Remark 6. Prior to our paper [GS11], it was shown in [BDMI11] that 5 r ( 2+£ )/ £ ^ (^Si^r+i ■ 
The improvement in the bound on r' from 2r/e to r/e to achieve (1 + e) approximation is not of 
major significance to our application, but since the tight bound is now available, we decided to 
state and use it. □ 

Remark 7 (Running time of our algorithms). If the Lasserre SDP can be solved faster than n 0( - r '^ 
time, perhaps in exp(0(r'))n c time for some absolute constant c, then the fact that we can find S 
deterministically in only 0(n 5 ) time would lead to a similar runtime for the overall algorithm. □ 

Picking the subset S* G ( ,) that achieves the bound (6) guaranteed by Theorem 7, we have 

Tr(X T X^X) = 5l(X) < (1 - e)" 1 £Vi • 

i>r 

In order to relate this quantity to the SDP objective value r) = Tr(X T XL), we use the fact that 
Tr(X T XL) is minimized when eigenvectors of X T X and L are matched in reverse order: i th 
largest eigenvector of X T X corresponds to i th smallest eigenvector of L. Letting = Ai(£) ^ 
A2OC) ^ • • • ^ A n (£) ^ 2 be the eigenvalues of normalized graph Laplacian matrix, £ = \L, we 
have 

^ = - d Jr{X T XL) > E^( X ) A *( £ ) > £ <n(X)K+x(£) > (1 - e)\ r+l (£)Sr_(X). 
Plugging this into (6), we can conclude our first bound: 
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Theorem 8. For all positive integers r and e £ (0, 1), given SDP solution x £ Lasserre^ r / £ ^(1/ 
[k], b =fl ), the rounding algorithm given in 3.3 cuts at most 



( 1+ (i- £ )L(/:) ) ^ \MV-**<U 



\x u \i-) - x v \L)\ { 

e=(u,v)eE 

edges in expectation. 

In particular, the algorithm cuts at most a factor ^1 + rf^jx +1 (£) ) more edges than the optimal cut 
with fi nodes on one side. 5 

Note that A n (£) ^ 2, hence even if we use n-rounds of Lasserre relaxation, for which x is an 
integral solution, we can only show an upper bound ^ § . Although this is too weak by itself for 
our purposes, this bound will be crucial to obtain our final bound. 



3.5 Improved analysis and factor j- approximation on cut value 

First notice that Equation (3) can be written as 

E [number of edges cut] = Jr(X T XL) + Jr(X T U^X(I - L)) = Jr(X T U^X) + Jr(X T U s XL) . (7) 

If value of this expression is larger than n^iu +1 + r l £ ' then value of Jr(X T H s XL) has to be larger 
than er\ due to the bound we proved on Tr(X T IIgX). Consider choosing another subset T that 
achieves the bound S r (HgX) . The crucial observation is that distances between neighboring nodes 
on vectors IT^X has decreased by an additive factor of r\e, 

Jr(X T U^XL) = Jr(X T XL) - Jr(X T U s XL) < 7/(1 - e) 

so that Jr(X T Ilg uT X) < (1 — g) +1 ■ Now, if we run the rounding algorithm with S U T as the 

seed set, and (7) with 5 U T in place of S is larger than ( 1-e ^ + W> then Tr(X T TlsuTXL) > 2er\. 
Hence 

Jr(X T U^ uT XL) ^ Jr(X T XL) - Jr(X T U SU TXL) < r/(l - 2e) . 
Picking another set T' , we will have Jr(X T Hg vjTvjT ,X) < (1 — 2e) nrigp; +1 • Continuing this pro- 
cess, if the quantity (7) is not upper bounded by r[rifr +1 + ?7 e after [~~] many such iterations, then 
the total projection distance becomes 

Tr^nW X) < (1 - \l/e\e)- \— ^ 

which is a contradiction. For formal statement and proof in a more general setting, see Theorem 31 
in Section 7. 

Theorem 9. For all integers r ^ 1 and e G (0,1), letting r' = O (-p), given SDP solution x 6 
Lasserre^ r '^(y x [k], b = ^), the expected number of edges cut by the above rounding algorithm is at most 
(1 + e)f min{l, A r +i(£)} times the size of the optimal cut with \x nodes on one side. (Here A r +i(£) is the 
(r + l)'th smallest eigenvalue of the normalized Laplacian C = of the G.) 

5 We will later argue that the cut will also meet the balance requirement up to o(/x) vertices. 
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3.6 Bounding Set Size 



We now analyze the balance of the cut, and show that we can ensure that | U\ = /x±o(/i) in addition 
to Tc(U) being close to the expected bound of Theorem 9 (and similarly for Theorem 8). 

Let S* fixed to be arg miri^ /y-\ Tr(X T XgX). We will show that conditioned on finding cuts 

with small Tg(U), the probability that one of them has \U\ ~ \i is bounded away from zero. We 
can use a simple Markov bound to show that there is a non-zero probability that both cut size and 
set size are within 3-factor of corresponding bounds. But by exploiting the independence in our 
rounding algorithm and Lasserre relaxations of linear constraints, we can do much better. Note 
that in the r'-round Lasserre relaxation, for each / G [2] s * , due to the set size constraint in original 
IP formulation, x satisfies: 



J2Mf),X u (l))=fM\\x S *(f)\f 



This implies that conditioned on the choice of /, the expectation of J2u x u{^) is ^ an d events 
x u (l) = 1 for various u are independent. Applying the Chernoff bound, we get 



/' 



/"log- 



Consider choosing / G [2] s * so that E [number of edges cut | /] ^ E [number of edges cut] = b. By 
Markov inequality, if we pick such an /, Pr [number of edges cut ^ (1 + £)&] ^ 1 — |, where the 
probability is over the random propagation once S* and / are fixed. 

Hence with probability at least |, the solution x will yield a cut U with Ta(U) < (1 + Qb and 



size \U\ in the range fi ± 2^J fi log ^. Taking ( = e and repeating this procedure O (e 1 log n) times, 
we get a high probability statement and finish our main Theorem 4 on minimum bisection. 



4 Algorithm for Quadratic Integer Programming 



First we will state couple of concentration inequalities in a form suitable for us. 

Definition 10. Given a G M. n , consider n independent Bernoulli random variables, X{ with. Pr [X; = 1] = 
Pi and Pr \Xi = 0] = 1 — Pifor some pi G [0, 1]. For any < e < 1, we define A e (a, p) as the minimum 
value that satisfies 



Pr 



E a i x i ~ E 



y^cijXj 



^ A £ (a,p) 



subject to 



E 



y^djXj 



Corollary 11. For given vector a£l™ and real p, and the following bounds hold: 
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1. For any a, A e (a, fi) = O ^yj\n^\\a\ 

2. If a is non-negative and \\a\\oo < i og (i/e) ^ en A E (a, n) = f ^/||a||oo/xlog \ 



Proof of item 1. Follows from Hoeffding bound. □ 

Proof of Item 2. Follows from Chernoff bound. □ 

Definition 12 (Generalized eigenvalues). Given two PSD matrices A,Be R mxm y 0, A is a general- 
ized eigenvalue along with the corresponding generalized eigenvector x provided that Bx / and 

Ax = \Bx. 

We will use Xi[A; B] to denote the i th smallest generalized eigenvalue. 

Observation 13. Given a PSD matrix A, \i[A; diag(^4)] is equal to the i th smallest eigenvalue of the 
matrix diag(A) -1 / 2 • A ■ d\ag{A)~ 1 / 2 where we use the convention 0/0 = oo and • oo = 0. 

The result we present here is a generalization of minimum bisection. Since its proof is ex- 
tremely similar to the one presented for minimum bisection, we only provide a sketch, while 
deferring the formal claim about the rounding analysis, Theorem 31, to Section 7. 

Theorem 14. Consider a quadratic integer programming problem 

min xf^Ax 
s. t Bx ^ c 

x u (i) = for all (u, i) G D, 

Ei6[fc] J «(*) = 1 for all ue V, 

ie{0,lf x W, 

where A is a PSD matrix A ^ 0, D C V x [k], B e R ex ( - Vx [k] \ c G R e . Further suppose that for all i e [I] 
and u & V, 



3 G [k] 



bi(u,j)^0 



< 1. 



Moreover let W be a positive real W > such that \\B\\ max ^ 1, ||-B|| m i n ^ and Opt (optimum value) 
is bounded from below by 

Given < e < 1, positive integer r, there exists an algorithm, running in time n 0(r / £2 )0(log W), to 
find a labeling x with objective value 

~t <~ 1 + e 

x Ax ^ F — 7OPT, 

mm{l, X r+1 \ 

where A r+ i = A r+ i [A; diag(A)] is defined as in Definition 12. 
Furthermore it satisfies the following properties: 

6 Here || - Umax and || • || m i n denotes the maximum and minimum non-zero entry in terms of absolute value respectively. 



17 



1. For all u£ V: 

2. For all (it, i) G D: 

3. For each row of B, hi, 



x u (i) = 1, 

ie[k] 
x u (i) = 0, 
(bi,x) ^ a- A £/(2 ^(6 i ,Ci). 



Proof. We can compute a solution x to r'-rounds of Lasserre relaxation of this problem, using The- 
orem 39 with Eq = and representing each constraint x u (i) = as a monomial equality 

constraint. We know that x has the following properties: x G Lasserre^ r ' ( V x [k] , -B^ c 
objective value x T Ax = rj ^ Opt(l + 0(e)), it obeys the hard constraints 



'w^) with 
for («, i) G D. 

Given such x, let a; be the rounded labeling promised by Theorem 31 (which obeys the first two 
properties by construction), with expected objective value 



l + e/2 



min{l,A r+ i}' 



Using Markov inequality, 



x T Ax 7J • 



l + e 



min{l, A r+ i} 



> 1 



For i G [£], probability that property 3 for i linear constraint will not be satisfied is |j by 
definition of A e . Taking union bound, we see that with probability Q(e), x ~ V* will satisfy all 
conditions and have x 1 Ax ^ m - m ^\ +1 y Repeating 0(n) many times, we can turn this into a high 
probability statement. 

For the running time bound, the rounding algorithm takes randomized time and solving 
the Lasserre relaxation takes (kn)°^W = n° ( - r '' ) W time. □ 



5 Algorithms for Graph Partitioning 

Our goal in this section is to give approximation schemes for Minimum Bisection, Small Set Ex- 
pansion, and various cut problems minimizing ratio of edges crossing the cut by the size / volume 
of the partition, such as Uniform Sparsest Cut, edge expansion, and conductance. Our results 
apply to weighted, not necessarily regular, graphs. Except for Minimum Bisection, it will be im- 
portant for our algorithm that we are able to handle disjoint (possibly empty) foreground and 
background sets F and B and find a non-expanding set U satisfying F C. U C. V\ B. So we state 
more general results with these additional constraints. 

Throughout the whole paper, when talking about graphs, we use the following convention: 
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Symbol Stands for 



G 

V,E,W 
n 
W e 
d u 

m 

Vol G (U) 

D 
A 
L 
C 
A 

Ai(-) 
OPT 



a connected, undirected weighted graph with non-negative edge weights, 
nodes, edges and edge weights of G = (V, E, W). 
number of nodes, n = \ V\. 
weight of edge e = {u, v} G E. 

(weighted) degree of node u G V, d u = Y, e ={u,v}eE w e- 

maximum degree, d max = max ug y d u . 

total degree, m = Y JU d u = Vol G (V). 

total degrees of nodes in U C V, VoIq (U) = Ylu&u du- 

sum of weights of edges in the cut [U, V \ U], T G {U) = Y,e={u,v}eE-.ueU,viU w e 

diagonal matrix of node degrees, D U:U = d u , D UiV = for u ^ v. 

adjacency matrix of G, A u>v = w^ u ^ if (u, v) G E, otherwise. 

Laplacian matrix of G, L = D — A 

normalized Laplacian matrix of G, C = D~ 1 / 2 LD~ X I 2 . 

normalized adjacency matrix of G, A = D~ x l 2 AD~ X I 2 . 



:th 



smallest eigenvalue of given matrix. 



optimum value (in terms of minimization) for the problem in question. 



For sake of clarity, let us formally recap the objectives of each of the cut problems we will solve 
in this section. In all cases, the input consists of a graph G = (V, W, E), and disjoint foreground 
and background sets F,BcV. 

• MINIMUM Bisection: Given an integer \i satisfying \V \ (F U B)\ < \i < \V\/2, the objective 
function is the minimum value of Tq(JJ) over sets U such that F C.U C V\B and | U\F\ = \x. 

• SMALL Set EXPANSION: Given an integer n satisfying Vol G (V \ (F U B)) ^ /x < m/2, the 
objective function is the minimum value of Tq(U) over sets U such that F C U cy\B and 
Vol G (U\F) = fi. 7 

• UNIFORM SPARSEST CUT: The objective function is to minimize 

^ g{u) - \u\.\v\uy 

over non-empty sets U C V such that F C U C.V\B. 

• EDGE EXPANSION: The objective function is to minimize 



h G (U) 



mm(\U\,\V\U\)' 
over non-empty sets U C V such that F C U C.V\B. 

NORMALIZED CUT: The objective function is to minimize 

r G (c/) 



ncutG(C^) 



Vol G (17) • Vol G (V \ U) ' 
over non-empty sets U C V such that F C {/ cy\B. 



7 Our methods apply without change if the volume of the set [/ \ F is only constrained to be in the range — 
£), + C)] for some ^ > 0. For concreteness, we just focus on the exact volume case. 
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CONDUCTANCE: The objective function is to minimize 



conductance^^) 



min(Vol G ([/),Vol G (V\U)) 
over non-empty sets f/cy such that F C. U CV\B. 

5.1 Minimum Bisection 

We will consider the following quadratic integer program formulation of Minimum Bisection 

min V] (x u (l) - x v (l)) 2 , 

X 

e={u,v}&E 

subject to x n (l) = /i, 

s tt (i) = 1 g F, 

s tt (2) = 1 Vu G i?, 
£ u (l)+£ u (2) = l VuGF, 

iG{0,lf x[21 , 

and its SDP relaxation: 

mmTr(X(l) T X(l)L), 

X 

subjectto \\ x Su{u}(fol u )\\ 2 = fi\\xs(f)f VSG and / G [2] 5 , 

x u (l) = x Vu G F, 
x u (2) = X0 Vn G 5, 
x G Lasserre (r,) (y x [2], b =t *) 

where b is a vector with b u (l) = 1 and b u (2) = 0. 

The first result is a straightforward corollary of Theorem 14. 

Corollary 15 (Minimum Bisection). Given < e < 1, positive integer r, a target size u < § , disjoint 
foreground and background sets (possibly empty) F and B, respectively, with 

fi^\V\(FUB)\, 

there exists an algorithm, running in time n°^ r / £2 \ to find a set U CV such that: 

FCU^V\B, 

H-O (vVlog(Ve)) < W \ F\ < + O fv^MV^) 



Fg ^) < ■ j! + rn n 0PT ■ 

mm{A r+ i(£), 1} 
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Proof. Follows from Theorem 14. Note that the objective matrix takes the form L' 



L 




Its 



(r + 1) generalized smallest eigenvalue A r+ i [L f ; diag(L')] is equal to (r + 1) smallest eigenvalue 
A r+ i(£)of£. □ 



5.2 Small Set Expansion 

Our next result is on the small set expansion problem. A naive application of Theorem 14 will yield 
good bounds only when the graph does not have high degree nodes (compared to the average 
degree). However our guarantee is irrespective of the degree distribution on graph G such that 
we are always able to find a set of volume /z(l ± e). In order to achieve this, the fact that we can 
assign arbitrary unary constraints in the form of foreground and background sets is crucial. 

We use the following standard integer programming formulation of SSE 

min V] (x u (l) - x v (l)) 2 , 

X 

e={u,v}£E 

subject to ^2 d u x u (l) = fj,, 

u£V\F 

x u (l) = 1 Vu G F, 
x u {2) = 1 \/u € B, 
x u {l)+x u {2) = \ VueV, 

ze{0,i} Vx[2] , 

and its natural SDP relaxation under Lasserre constraints: 

minTr(Af(l) T A'(l)L), 

X 

subject to x u (l) = X0 Vu € F, 
x u {2) = X0 Vu € B, 
x £ Lasserre (r,) (y x [2},b=^) 

where b is a vector with b u (l) = d u and b u (2) = 0. 

Theorem 16 (Small Set Expansion). Given < e < 1, positive integer r, a target volume \i, disjoint 
foreground and background sets (possibly empty) F and B, respectively , with 

li < Vol G (V\(FU B)) , 

q( r+log(l/e) \ 

there exists an algorithm, running in time n \ ^ / to find a set U C V such that: 

FCUCV\B, 



r G (u) ^ . s ^ + £ (n n DPT 

min{A r+ i(£), 1} 



and 
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1 - !fC - max uen(FuB) d u < O i^-^r ), tfzen 



^ 1 _ y^ log ijj < Vol G ( £ /\F)<^l + y^lcgijj 

(In fact, in this case the running time is bounded by rP^ I ^ .) 
2. Else 

fi(l-e) ^Vol G (U\F) + . 

Proof of 1. We follow the exact same proof as Theorem 14 using the standard integer programming 
formulation of SSE, and defining the cut U = {u \ x u (l) = 1}. Using bound 2 from Corollary 11, 

A £ / 2 (d, a) ^ ©(-y/dmax/zlog^/e)) provided that d' max < O (j^rl- Applying Markov bound on 

Ec/ [T G (U)\, we see that with probability Q(e), 

r (u) < i±£ 

and 



^-OKK>#) < Vol G (17 \ F) ^ M + Oi Iog(l/e) . □ 



Proof of 2. At a high level, our algorithm proceeds in the following way: We enumerate all sub- 
sets Uq of volume at most fi from the set of high degree nodes 7-L, which is defined by % = 

jii | d u ^ io g (i/ £ ) ■ F° r each such subset £/o, we solve the corresponding Lasserre SDP relax- 
ation of Small Set Expansion problem on this graphwith foreground set F' = F U Uq, background 
set B' = B U (H \ Uq) and target volume fjf = \i — VoIq (^o)- Note that the maximum degree of 
any unconstrained node for this problem d'^ ax = max- u ev\(F'uB') d u is at most d^ ax < i og (i/ g ) ^' 
There are two possible cases: 

1. If d'' nax < log^/g) ^' then we can apply the analysis given in proof of 1 to find a set U such 
that 

|Vo1g (17o U U) - Ml ^ 0(V/i'd^ log(l/e)) < *S V/^V = 

2. If <' nax ^ log^/g) ^/ then we have log l 1/e) »' < i og (i/ e ) ^ which imp^es //' < eV 

In this case, instead of Chernoff bound, we use a simple Markov bound to conclude that 



< e/2. 



Combining this with the bound on r<3(C/) / with probability 0(e), we will find U with 

|Vol G (t/o U U) - /i| ^ ///(e/2) ^ 2^. 
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After enumerating all such sets, we return the one with smallest cut. Correctness of this algo- 
rithm is obvious. 

For running time, note that number of nodes we can choose from % is at most log Q/ £ ) . Hence 
we solve Lasserre SDP at most 

times. Consequently the total running time is 



5.3 Other Graph Partitioning Problems 

Our final problems are graph partitioning problems with a ratio in the objective, uniform sparsest 
cut, edge expansion, normalized cut and conductance. All these results are direct extensions of 
our Minimum Bisection with arbitrary target sizes or Small Set Expansion results. 

Note that the natural integer formulation for these problems involve a ratio in the objective and 
even after relaxing integrality constraints, the resulting formulation is not SDP anymore. More- 
over due to the presence of Lasserre constraints, one can not simply equate the denominator to a 
constant, say 1, and solve the resulting SDP. We instead guess the value of denominator and solve 
the corresponding Min-Bisection or SSE problem, repeating this for all poly(ra) possible values. 

Corollary 17 (Graph Partitioning). Given < e < 1, positive integer r, disjoint foreground and back- 
ground sets (possibly empty) F and B respectively, there exists an algorithm, running in time n °^ r+log ^ 1 ^ £ ^^ £2 ) , 
to find a non-empty set U such that: 

F C U C V\B, 

{^ G (C/),/i G (C/),ncut G ([/),conductance G (C/)} < — — -yOPT . 

mm{A r+ i(£), 1} 

We sketch the proof for only conductance. Proofs for other problems follow the same pattern. 

Proof for Conductance. Let U* be the optimal solution with /i* = VoIq (U*) and rj* = r G (C/*). 

We guess the volume of optimal partition p! and invoke Theorem 16. If we keep repeating this 
for all values of p! in ^ {l, 2, . . . , [^J }, we will find U such that 

|Vol G (U) - < ep* + oil) and T G (U) ^ 



min{l, A r+ i} 

so that <Z>g(U) ^ mit 1 i | ] ° A ( ^ i} ^ G ([/*). The running time will be bounded by 



/77\ n ( r+log(l/e) \ 
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5.4 Generalizations to A;-way Partitioning Problems 

Note that all these results can be generalized to their respective k-way partitioning versions (wher- 
ever it makes sense). The only difference is that, in each case, the objective matrix will be a block 
diagonal matrix consisting of k copies of graph Laplacian matrix. It is easy to see that such a ma- 
trix has exactly k copies of the original eigenvalues, so instead of r rounds of Lasserre hierarchy, 
we will use k • r rounds instead. 



Corollary 18 (Minimum fc-way Section). Given < e < 1, positive integer r and a list of target set 

q( kr\ 

sizes {^i}i£[k] with ^ /ij = n, there exists an algorithm which runs in time n to find k disjoint sets, 
{Ui\i =1 such that: 

v = \Ju u 



Vi : in-0 (Vftlip) < \Ui\ ^fn + O (V^log(fe/e)) , 

1±£ , 

min{A r+ i (£),!} 



provided that such sets exist. 

Here Opt is taken as minimum ofY^i ^c(Ui) over all k-way partitions ofV with \U{\ = /Xj. 

Proof. Proof follows by applying Theorem 14 to: 



m ~ in XT X] w e (x u (i) - x v (i)) 2 , 

i e={u,v}eE 

subject to x u (i) = m for all i € [k], 
uev 

x u (i) = 1 £ V, 

ie[fc] 

ie{o,if x[f:] . 

Let L be the matrix in the objective. As remarked at the beginning of this section, the normal- 
ized matrix C has k copies of each eigenvalue of C, so x will satisfy 

S r fe< _L+°(£> 0pt 
mm{A r/fc (£), 1} 

and set size constraints. Choosing Ui <— x~ l {i) completes the proof. □ 



6 Algorithms for Unique Games Type Problems 

In this section, we obtain our algorithmic result for Unique Games type problems. Let us quickly 
recall the definition of the Unique Games problem. An instance of Unique Games consists of 
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a graph G = (V, E, W) with non-negative edge weights w e for each edge e £ E, a label set [&], 
and bijection constraints 7r e : [fc] — > [k] for each edge e = The goal is to find a labeling 

/ : V — > [k] that minimizes the number of unsatisfied constraints, where e = {u, v} is unsatisfied 
if 7r e (/(«)) 7^ /(u) (we assume the label of the lexicographically smaller vertex u is projected by 

7T e ). 

Remark 8. Unique Games can also be captured in the quadratic integer programming framework 
of Section 4, where the matrix A defining the PSD quadratic form corresponds to the Laplacian of 
the "lifted graph" G with vertex set V x [k] obtained by replacing each edge in G by a matching 
corresponding to its permutation constraint. However, except for the problem of maximum cut, 
we are unable to apply the results from that section directly because there is no known way to 
relate the r th eigenvalue of the constraint graph to say the poly(r) tft eigenvalue of the lifted graph. 
Hence we use the "projection distance" type bound based on column selection (similar to Section 
3.4), after constructing an appropriate embedding to relate the problem to the original graph. □ 

Remark 9. Although we do not explicitly mention in the theorem statements, we can provide 
similar guarantees in the presence of constraints similar to graph partitioning problems such as 

• constraining labels available to each node, 

• constraining fraction of labels used among different subsets of nodes. 

For example, the guarantee for maximum cut algorithm immediately carries over to maximum 
bisection with guarantees on partition sizes similar to minimum bisection. □ 



6.1 Maximum cut 

We first start with the simplest problem fitting in the framework for unique games — finding a 
maximum cut in a graph. We use the following standard integer programming formulation. Note 
that this formulation is for the complementary objective of finding minimum uncut: 

min V w e ■ - [(x u (l) - x v {2)f + (x u (2) - x v (l)) 2 ] , 

e={u,v}£E 

subject to x u (l) + x u (2) = 1 \/u e V, 

xG{0,l} yx[21 , 

and its natural SDP relaxation under Lasserre constraints: 

'A:{1) T X{1) X{l) T X{2)\ ( D -A 
y X{2) T X{l) X(2) T X{2)) \-A T D 

subject to x G Lasserre^ \V x [2]). 



min — Tr 

x 2 



Theorem 19 (Maximum Cut / Minimum Uncut). Given a weighted undirected graph G = (V, E, W), 
for all e e (0, 1) and a positive integer r, there exists an algorithm to find a set U C V such that the total 
weight of uncut edges by partitioning (U, V\U) is bounded by 

2+e 1+e 
mm <! 1 + —r, — — r- )■ ■ OPT 

A r+ i(£j mm {2 - A n _ r _i (£),!} 
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in time n°(^2°^' 2 J , where OPT is the total weight of uncut edges in the optimal labeling. 

Proof. The first bound will follow from the more general result for Unique Games (Theorem 20 
below), so we focus on the second bound claiming an approximation ratio of (1 + e)/ min{2 — 
An— r— l) !}• 

The Laplacian matrix, L corresponding to the lifted graph, G, for maximum cut can be ex- 
pressed as: 

L-( ° - A ]-( D ~ A " 
L ~ \—A T D)-\-A D 

whose normalized Laplacian matrix is given by 

c=( T ~ A 

\-A I 

By direct substitution, it is easy to see that, for every eigenvector q, L of constraint graph's normal- 
ized Laplacian matrix, C, there are two corresponding eigenvectors for C, ( an d ^ 

with corresponding eigenvalues given by Aj and 2 — Aj respectively. As a convention, we will refer 
to the first type of eigenvectors as even eigenvectors and the latter type as odd eigenvectors. 

For any node u G V, we can express x u (i) for i G [2] as 

x u {i) = \\x u {i)\\ 2 x % + (-l) i ||x lt (l)||||x u (2)||y u , 

where y u is a unit vector orthogonal to x$, (x^,y u ) = 0. For any set S, UgX u (l) = Ilg(x^x u (l)) = 
fl5?/u = — n^x n (2). Consequently X T YlgX has zero correlation with even eigenvectors of L. 
Therefore we have the following identity: 

Jr(X T U^x^XL) =Tr(^(l) T n^x^n^(l)( J D + A)). 

In particular, we can slightly modify Theorem 31 to take into account only the eigenvectors of L 
with which x^X has non-zero correlation. Using our standard rounding procedure, we can then 
find a set U for which the fraction of "uncut" edges is bounded by (1 + e) min ^ A °^i+a) i) • ^ ne 
proof is now complete by noting that A r+ i(7 + A) = 2 — A n _ r _i(£). □ 

6.2 Unique Games 

In this section, we prove our main result for approximating Unique Games. We consider the 
following IP formulation: 



^2 w e ■ }-y2(x u (i) - x v (ir e (i))Y 
x — L — 

e={u,v}&E ie[fc] 

subject to ^2 %u(i) = 1 Vu G V, 
x£{0,l} Vx[k] , 
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and its natural SDP relaxation under Lasserre constraints: 

min ^Tr ( X T XL 



x 2 

subject to x G Lasserre^ \V x [k]). 

Theorem 20 (Unique Games). Let G = (V, E, W) be an instance of Unique Games with, label set [k] and 
permutation constraints ir e for each e G E. 

Then for all e G (0, 1) and positive integer r, there exists an algorithm to find a labeling of nodes V, 
f : V — > [k] with total weight of unsatisfied constraints bounded by 

E w -\m^(f(u))\ < ( i + x 2+ , £ n ) 0PT 

e={u,v}&E V ^r+l^J/ 

in time , where OPT is the total weight of unsatisfied constraints in the optimal labeling. 

Proof. Let x be vectors satisfying r' = 0(r/e)-levels of Lasserre hierarchy constraints with 



V-J E W e^2\\ x u(f) - X v {7T e {f))\\ 2 . 



4 

e=(u,v)eE f 



where for notational convenience we treat each undirected edge {u, v} as two directed edges of 
half the weight. Let S = S* G (Y,) to be chosen later. By choosing the labeling from / ~ V* , we 
know by using Claim 28 that the expected weight of unsatisfied constraints is bounded by: 



\ W e Pr f~V* [/(«) Te(/(«))] = \ E VeT,( U S*MMxi-X v Mf)))) 

e=(u,v)eE e=(u,v)£E f 

\ E We E K(/)ll 2 - <*«(/), *«M/))> + (n^x u (/),n^„(7r e (/))) 



2 

e=(u,v)eE f 



+ \ E ^E^ n ^«(/)' n s^K(/))) 

e=(u,v)£E f 

i \- v-l|nW/)ll 2 + l|nW/)|| : 



V ' 2 



+ - E ^E 

e=(u,v)£E f 



+ ^E d "En n ^(/)ii 2 



" ' 2 

/ 



If we let Ps be the projection matrix onto span of {x v (f)} v eS,fe[k] > f ne above is upper bounded by 

^ + ^E^En p ^(/)ii 2 - 



2 

/ 



Therefore the total weight of unsatisfied constraints is bounded by: 

aE»d»E/P^»(/)ll 2 



Y.e={u,v)£EY.fWe\\x u U) ~ *v(*e{f))\\' 
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Consider the embedding {x u (f)} f € \u X u given in Theorem 21 below. For this embedding, we 
know that, for any set S, the above quantity is bounded by 

+ mMxjXul \ ( mx^xd) \ 

V l^e=(u,v)eE^\\Xu-X v p) '\ 2Tr(X?XL) J 

If we further scale X by D 1 / 2 so that X' = D 1 / 2 X, 

'\ Jr(X' T X'C) J 

where C is the normalized Laplacian matrix. Picking S* = argmin^g/v^ Tr(X' T X' qX'), and ap- 
plying Lemma 30 we obtain the desired result. □ 

Theorem 21 (A useful embedding). Given vectors x G w nx ( Vx i k }) with the property that, for any 
u G V, whenever f,g G [k] u are two different labellings of ' u, f ^ g, 

(x u (f),x u (g)} = 0. 

Then there exists an embedding {x u (f)} -Xu wzf/z the following properties: 

1. For any u £ V, \\X U \\ 2 = £ f \\x u (f)\\ 2 . 

2. For any u, v G V and any permutation tt g Sym([/V|): 

^ IMO - x„(7r(*r)|| 2 ^ - X„|| 2 . 

ie[fc] 

3. For any set S Q V and any node u G V, if we let Ps be the projection matrix onto the span of 

{ x s(f)}seS,fe[k) : 

\\X^X U \\ 2 > Yl W p s*u(f)\\ 2 - 

f€{k]« 

In the rest of this section, we will prove Theorem 21. 

Our embedding is as follows. Assume that the vectors x u (f) belong to R m . Let e\,e2, ■ ■ ■ ,e m G 
R m be the standard basis vectors. Define X u G R m (g> R m as 

m 

X u = y* j }^ (x u (f),ei)x u (f) <g> ei . 
i=l Mk] u 

Observation 22. For vectors x, y G M"\ e «)(?/> e «) = ( x > £/}• 
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The first property of the vectors X u follows from this observation easily: 

\\ x uf = E ^2(x u (f), ei)(x u (g),ei}(x u (f),x u (g)} 

i f,9 

= ^2( x u(f),x u {g)} ^2(x u (f),ei)(x u (g),ei) 

f,g » 
= ^2( x u(f), x u (g))(x u (f),x u (g)} 

f,g 

=Eim/)H 2 - 
/ 

We prove the second property in Claim 23 and third one in Claim 24. 
Claim 23. For any permutation tt g Sym([fc]): 

±\\x u - x v f ^ e - ^wcnii 2 

ie[k] 

Proof. Without loss of generality we assume 7r is the identity permutation. We have 

l \\x u \\ 2 + ll^ull 2 

^ll^" - -^11 = 2 (x u ,x v ) 

\\X II 2 + \\X II 2 

= — 2 " ^2( x u{f),x v {g)} ^2(x u (f), ei)(x v (g),ei) 

f,9 i 

= e IM/)l|2 + IM/) " 2 - e<^,^> 2 ik(/)iiims)ii 

f f,9 

The sum over all pairs is lower bounded by summing only the corresponding pairs: 

< \ E (lM/)ll 2 + IM/)II 2 - 2(x u (f),xM))Mf),^U)) 
f 



\ E \Mf) - x »u)\? + *»(/)> (i - Mf),x v (f))) (8) 



>o 



Since the coefficient of (x u (f),x v (f)) is positive, we can use Cauchy-Schwarz inequality to replace 
{x u (f),x v (f)} with ||xu(/)|| • \\x v {f) || in Equation (8) to obtain: 

< \ E \M) ~ X M + E • IM/)H " Mf),x v (f))) (9) 

/ / 

Using inequality ||x u (/)|| • ||x„(/)|| ^ \ (K(/)l| 2 + |M/)I| 2 ) on Equation (9): 

< \ E (\\ X M " x -(/)H 2 + ll^(/)H 2 + IM/)I| 2 - 2(xu(f),x v (f)) 

= Eim/)-^(/)ii 2 - D 

/ 
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Claim 24. 



Proof. For any 6 



\x^x u \\ 2 ^Y.W p s x uU)t 
f 



5 S. 



\\X U -^2,9 V X V \\ 2 = ^2 ^2(x u (f),ei)x u (f) - ^2 O v {x v (g),ei)x v (g) 

v i=l f v£S,g 

V v ' 

Substituting af = PgX u (f) and j3j = Psx u (f), Equation (10) is equal to: 



J2 ^(x u (/),e i )(a / + /3 / )-P 5 G ) 



^2( x u(f),ei)a f 
i=i / 



+ 



/ 



(10) 



□ 



5Z^ u ^' e ^ a / 
i=i / 

m 

1,9 *=1 

= J^(a/,a fl )(s u (/),a; tt (p)) = ^ ||a/|| 2 = ^ ||P^x u (/)|| 2 . 
This concludes the proof of Theorem 21, therefore also the proof of Theorem 20. 

7 The Main Rounding Algorithm and Its Analysis 



In this section we state and prove the main results concerning our rounding algorithm for Lasserre 
SDP solutions, and in particular prove Theorem 31 which we used to analyze our algorithm for 
quadratic integer programming and its applications to graph partitioning. Some of this discussion 
already appeared in the simpler setting of Minimum Bisection in Section 3. All our rounding 
algorithms are based on choosing labels of a carefully chosen "seed" set S* of appropriate size r', 
which is then propagated to other nodes conditioned on the particular labeling of S* . 

For easy reference, we describe the rounding procedure in Algorithm 1 and the seed selection 
procedure in Algorithm 2. 

7.1 Simple lemmas about rounding 



We first describe how to perform the rounding after a good choice of the seed set S* has been 
made, followed by an analysis of its properties. This part is quite simple; the crux of our rounding 
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Algorithm 1 Algorithm for labeling in time ( k r ' + n ) . 



Input: S* C V of size at most r', x G Lasserre( r )(V X [A;]). 

Output: x G {0,l} yx [ fc L 

Procedure: 

1. Choose / G [A;] 5 * with probability ||x5* (/)|| 2 - 

2. Label every node u G V by choosing a label j G [fc] with probability ^^|f a *gf(/)|p"'^ ■ 



Algorithm 2 Algorithm for finding seed set in time 0(n 5 ) deterministically. 

Input: Positive integers r,r' = ^,xG Lasserre( r ')(F x [fc]) and a PSD matrix L G rO^MM^xM). 

Output: Seed set S* C V of size at most r' satisfying Equation (11). 

Procedure: 

1. Let 5* <- 0. 

2. Repeat until S* satisfies Equation (11): 
(a) Find new ^-many seeds T G ( V *i^) using deterministic column selection algorithm 



r/e 

given m [GS11] on matrix diag(L) 1 / 2 n^» X. 



(b) T 



3j G [k] : (u,j) G T 



(c) S* <- S* U T. 



is how to choose the best S* and bound the performance when it is used as the seed set. This will 
be described in Section 7.2. 

Definition 25 (Rounding distribution). Given x G Lasserre^'^F x [k]) and S* G (^), we define V* 
as the distribution on labellings ofS*, in which a labeling f G [k] s * is chosen with probability: 

Pr f '~v* [/' = /] = \\x s *(f)\\ 2 . 
Here f ~ V* denotes choosing from distribution V*. 

For any f G [k] s * , we use V*j as the distribution on binary vectors corresponding to labellings ofV, 
{0, 1} V , in which each node u G V receives, independently at random, a label g G [k] u with probability: 

Pr^ M<?) - 1] - lMfW - ]]xMm . 

We will abuse the notation and use x ~ V* for sampling a binary labeling vector by first choosing 
f ~ V* and then choosing x ~ V*j. 

We now prove some simple properties of this rounding. All claims below hold for every fixed 
choice of S*. 
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Claim 26. For any u e V and g G [k] u , we have 

Pr*~v* [Mo) = 1] = \\x u ("' 



Proof. Indeed, by definition of the rounding scheme, Pr^v* [x u (g) = 1] equals 

f \\ x s*(f)\\ 2 f 

Before stating the next claim, let us again recall the definition of the projection operator used 
in the analysis of the rounding. 

Definition 27. Given x e Lasserre^ (V x [A;]), we define lis G M TxT as the projection matrix onto the 
span of{x s {f)} fe [fc] s for given S: 

fe[k]s 

Define IT^ = I — lis to be the projection matrix onto the orthogonal complement of the span of 
i x s(f)} /e[fc] s ' wnere I denotes the identity matrix of appropriate dimension. 

Claim 28. For any u / v G V and g G [k] u , h G [k] v : 

Prx~x>* [x u {g) = lAx u (h) = 1] = (Jls*x u (g),Hs»Xv(h)). 

Proof. 

d r~ f\ n~m n V-ii , f , u2 \\ x S*u{u}(f ° g)\\ 2 \\x S *u{v}{f ° h)\\ 2 
P*x~v* [xu(g) = lAx v (h) = 1] =2^ — ||a; st (/)||4 

_ \\xs*u{u}(f ° g)\\ 2 \\xs*u{v}(f ° h)\\ 2 

~y \\x s *{fw 

= (xs* {f),x u {g))(x s * jf),x v {h)} 

y \\x s *(fw 

= Y (xs*(f),x u (g))(xs'-{f),x v {h)) 
f 

f 

= x u (g) T Il s *x v (h) = (Us*x u (g),Us*x v (h)). □ 

Claim 29. Given any S* with x sampled from V* as described, the following identity holds: 
For any matrix L G R( Vx M) 2 , 

%~x>* [x T Lx] = Tr(Af T n^^diag(L)) +Jr(X T U s -XL) 
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Proof. Consider L = diag(A) + L°: 

[x T Lx] =Ej^x>* [x T di&g(L)x + x T L°x] 
Using Claim 26 and Claim 28: 

=Tr(X T Xdmg(L)) + Tr(X T IL s *XL°) 
=Tr(^ T ^diag(L)) + Jr(X T U s *X(L - diag(L))) 
=Tr(^ T n^^diag(L)) +Jr(X T IL s *XL). □ 



7.2 Choosing A Good Seed Set 

In this section we show how to pick a good S* and prove our main result, Theorem 31, which lets 
us relate the performance of our rounding algorithm to the objective value of relaxation. 

We begin with a lemma relating the best bound achieved by column-selection for a matrix X 
(as in Theorem 7) to the objective function Jr(X T XL) with respect to an arbitrary PSD matrix L. 

Lemma 30. Given X G ^ mxn an d a psu matrix L G M. nxn y Ofor any positive integer r and positive 
constant e > 0, there exists r/e columns, S G (y) o/X swcfo £?za£ 

Tr(X T ^Xdiag(L)) < Jr(X T XL) 



(1 - e)A r+ i [L;diag(L)] 

where \ T +\ [L; diag(L)] is (r + l) th smallest generalized eigenvalue as defined in Definition 12. Further- 
more such S can be found in deterministic 0(rn 4 ) time. 

Proof. Let X Xdiag(L) 1 / 2 and C <- diag(L)~ 1 / 2 Ldiag(L)~ 1 / 2 with convention 0/0 = oo and 
• oo = 0. Note that the i th smallest eigenvalue of C, \i{C), corresponds to the i th smallest gener- 
alized eigenvalue Xi[L; diag(L)] by Observation 13. If we let o~i be i th largest eigenvalue of X T X, 
then using Theorem 7 on vectors X, we can find S G (,J^) in time 0(rn 4 ) such that 



Tr(X T X^X) < jL- °i. 



By the von Neumann-Birkhoff theorem, Jr(X T XC) is minimized when the i th largest eigenvector 
of X T X corresponds to the i th smallest eigenvector of C: 

Jr{X T XC) > Y. a ^ > H °^ > Xr +i Yl a iX l - e)K+i^{X T X^X). 

The span of {X u } ue s is the same with {X u } ue s since X u differs from X u only by a scaling factor 
which does not affect the span. In particular, Xg = Xg. 

Tr(X T X^X) = Jr(X T X^X) = Jr(X T X^ Xdiag(L)). 

The proof is complete by noting that Jr(X T XC) = Jr(X T XL). □ 
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Theorem 31 (Main technical theorem). Given positive integer r and e G (0, 1), let xbe a set of vectors 
satisfying r' = O (^-) rounds of Lasserre hierarchy constraints, x G Lasserre^'^y x [k]) with X = 
( x u(f)) U £Vj£{k] u being the matrix whose columns are vectors of x corresponding to singletons. 

Given any PSD matrix L G R(Vx[*])x(Vx[*]) / w f tn L y , we can find a seed set S* of size at most r' 
in deterministic time 0(n 5 ) with the following properties. For x randomly sampled from the distribution 
V*, x ~ V* , as described in Definition 25: 

1. x is a binary vector, x G {0, 1}^ X W. 

2. x is an indicator function of a proper labeling ofV. In particular for any u G V , 

i€[k] 

3. If there exists u and i such that x u (i) = xq, (equivalently \\x u (i)\\ 2 = 1) then x u (i) is always 1. 
Similarly, ifx u (i) = 0, then x u (i) is always 0. 

4. The expected correlation ofx with L is bounded by the correlation of x with L as follows: 

r~Tr~i 1 + e Tr(X T XL) 

E^. [x T Lx] ^ r-yr fr 

L J 1 — e mm{A r+ i, 1 j 

Here X denotes the matrix with columns x u (i), (u,i) G V x [k], and A r+ i = A r+ i [L; diag(L)] is 
(r + l) th smallest generalized eigenvalue as defined in Definition 12. 

Furthermore this set S* satisfies the following bound 

Tr(^n^diag(L)) + Jr(X T U s ,XL) < \ + £ (11) 

1 — e mm{A r+ i, 1} 

where lis* is defined as in Definition 27. 

Proof. Note that the first three properties follow by construction of V* . Using Claim 29, it can be 
seen that the bound (11) is a equivalent to 4. Therefore it suffices to prove item 4. 

Let ro = r/e. Consider picking our "seed" nodes in the following iterative way as described in 
Algorithm 2. Starting with 5(0) <- 0, for each i G {1, 2, . . .}, let 

S(i) <- argmin Tr(X(i - l) T X(i - l)gX(i - l)diag(L)), 

and let S(i) to be the set of nodes whose at least one label appears in S(i), 

S(i) <- iu | 3/ G such that (u, f) G £T(i)} ; 5, <- (J S(j). 

followed by X(i) <s— ILg A\ At each step we set 5* «— and repeat this until E^x>» [a^Lx] is at 
most j^f m i n (i^A +1 ) • Here 77 = Tr(X T XL). We will show that this procedure will stop for some i 
with i ^ [|] in Claim 36. 
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Note that, by Lasserre constraints, all vectors in {x u {f)} u ^sj&[k} u are linear combinations of 
vectors in {xs{f)}fe[k] s - Hence for any subset of nodes T C V of size at most r', X^ x nu >z ■ 

For any i, using V* (i) to denote the distribution at iteration i with seed set chosen as S* <s— Si, 
by Claim 29: 

Es~2J.(i) [x T Lx] =Tr(* T n£*diag(L)) + Tr{X T Yl Si XL) (12) 
Let £j be defined as £j = Ex~x>*(i) [5 T L5?] so that: 

& =Tr(* T n^diag(L)) + Tr(* T n 5i *L) 

> . ' ' v ' 

Finally for convenience we define A' r+1 as the following: 

A; +1 ^(l-e)min{A r+ i[L;diag(L)],l}. (13) 

We will show that this procedure will stop for some i with i ^ [^] in Claim 36. Note that each 
iteration takes time at most O(ron 4 ). If this procedure takes K iterations, we have r$K ^ n, hence 
running time is 0(KrQn 4 ) = 0(n 5 ). 

Observation 32. 

5 i+1 = Tr(* r n^ +i *diag(L)) < Tr(X (if n^ i+1) X(i)diag(L)) 

Proof. Note that n^n^ +1 ,II^ y Ilg since all vectors of the form xs t {-) and x §({)(•) are linear 
combinations of vectors x$ i+1 (•)• Using the definition of X(i), X(i) = Hg.X, the proof is complete. 

□ 

Observation 33. For any i ^ 0,we have rji ^ rj. 

Proof. Note n 5 . ^ I. Since L is PSD, m = 7x{X T Fl s XL) ^ Tr(X T XL) = rj. □ 



Claim 34. For any i ^ 0, 



>i+l < -p • 



where A' r+1 is defined as in Equation (13). 
Proof. Using Observation 32, 

^(i+i) 



5 m < Tr(X(i) T n^ +1) X(i)diag(L)) 



< Tr(X(i) T Xi- (4+1))<[fc] X(i)diag(L)) 

< Jr(X(i) T Xj {i+l) X(i)di ag (L)) 

< n ^— Tr(X(i) T X(i)L) , 

(l — eJA r+ i 
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where the first inequality follows from < ^s(i+i)x[k]' an< ^ * ne secon d inequality from 

S(i + 1) C S(i + 1) x [k]. For the last inequality we can immediately apply the bound from 
Lemma 30. Using (1 — e)A r+ i [L; diag(L)] ^ A' r+1 for \' r+1 as defined in Equation (13), and the 
identity 

Tr (X(i) T X(i)L) = Jr(X T n^.XL) = Jr(X T XL) - Tr{X T U s ,XL) = V ~ Vi 
we conclude the proof. □ 



Claim 35. If£i+i > Vjr^> then 



V+l 



-w — < m+i- 



Proof. Using Claim 34, 



Hence 



< 6+1 = Oi+i + »7i+l < "T7 ^ Vi+l- 

A r+1 A r+1 



A r+1 

Claim 36. Tfaere exisfs i [^] /or zyfa'c/i & < Vjr^- 



V+l 



Proof. By contradiction. Let if = [~] and assume for all i ^ K,£i > r/jr^. By Claim 35, 



r+l 



e 

r/i >7?-t — ^ r?e 
A r+1 

e r?i e 
% >??!7— + T7" > (1 + 1) > »/ • 2e 

A r+1 A r+1 A r+1 



A r+1 A r+1 A r+1 

which contradicts Observation 33. □ 
This completes the proof of Theorem 31. □ 

8 Algorithm for Independent Set 

Our final algorithmic result is on finding independent sets in a graph. For simplicity, we focus on 
unweighted graphs though the extension for graphs with non-negative vertex weights is straight- 
forward. As usual, we denote by a(G) the size of the largest independent set in G. Let ci max denote 
the maximum degree of a vertex of G. 
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Theorem 37. Given < e < 1, positive integer r, a graph G with <i max ^ 3, there exists an algorithm to 
find an independent set I QV such that 



1 



III > a(G) ■ min < — r — . - 1 ,1} (14) 

11 K J I 2d max V(l - e) min{2 - A n _ r _i(£), 1} 



in time n ^- 1 



Remark 10. The above bound (14) implies that if X n - r -x, which is the (r + l) st largest eigenvalue 
of the normalized Laplacian C, is very close to 1, then we can find large independent sets in 
n O(r/e ) ume . in particular, if it is at most 1 + 4rf ^ ax / then taking e = 0(l/d max ), we can find an 
optimal independent set. The best approximation ratio for independent set in terms of d max is 
about O ( dmax iogdl°f x dmax ) [Hal98, Hal02]. The bound (14) gives a better approximation ratio when 

An-r-l ^l + O ( w^— ) • □ 



Proof, (of Theorem 37) Consider the following integer program for finding largest independent set 
inG: 



max 



subject to x u (l)x 1 ,(l) = for any edge e = (u, v) G E , 
x u (l) + x u (2) = 1 for all u G V. 

ie{o,if x[21 . 

Solve the corresponding Lasserre relaxation using Theorem 39 with eo = o\i) ■ Let x be the found 
solution. Note that every quadratic constraint corresponds to a monomial constraint. 

From now on, we assume x is a feasible solution to the following problem 

max Jr(X(l) T X(l)) 
subject to (x u (l), x v (l)) = for any edge e = (u,v) £ E , 

x £ Lasserre ^(F x [2]) 

with value J2 U \\ x u{^)\\ 2 = Tr(X (1) T A^(1)) = p,. Here, denotes the matrix with columns x u (l), 
and henceforth in the proof we will denote X(l) by X, X = 

For S* to be chosen later, pick x ~ V* as per Definition 25. For ease of notation, we will denote 
S* by 5 in the ensuing calculations. We convert x into an independent set as follows. 

1. For each u, if x u (l) = 1 then let I <— I U {u} with probability p u which we will specify later. 

2. After the first step, for each edge e = {it, v}, if {u, v} C /, we choose one end point randomly, 
say it, and set I <— I\ {u}. 

By construction, the final set / is an independent set. 



37 



Note that for any u, the probability that u will be in the final independent set I is at least: 



v£N(u) 

PnPt;(n5X u (l),ri5X„(l)). (15) 



2 



By (15), the expected size of the independent set found by the algorithm satisfies 

^]>> u |Mi)|| 2 - £ p^(n s z u (i),n s x„(i)). (16) 



Note that for every edge {u, v} G E 1 , 

lN = \p (xs(f),x u 

*s(f)\\ 



(il SXu (i),il SXv (i)) = y: (^n^f^sm^ > . {17) 



/6[2] S 

We now consider two cases. 

Case 1 : (Usx u (l),Usx v (l)) = for all edges {u, v} G £. In this case, we take p u = 1 for all u G 
and by (16), we find an independent set of expected size at least /i ^ a(G). 

Case 2: In this case, we have 

]T (n s x u (i),n s x„(i)) = iTr(x T n 5 x^) >o, (18) 

{u,v}eE 

where A is the adjacency matrix of G. Let A = D~ x l 2 AD~ 1 / 2 be the normalized adjacency matrix, 
and define 

Tr(*rn s XA) 

? Tr(X T X) ' U ; 

By (17) and (18), we have f > 0. 

We now pick p u = -)= for all u G V, where we will optimize the choice of a shortly. For this 



choice, we have 

E[|/|] > a^2^=\\x u (l)\\ 2 -\a 2 Jr(X T n s XA) 

u V u 

> -^^||x u (l)|| 2 -ia 2 Tr(X T n 5 ^) 



a 1 2 Tr(X J II S XA) \ 
TCI* M* T X) J 



This expression is maximized when a = - \ , for which it becomes: 

1 ;V«max 



m\]>J L -l- (20) 
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We now describe how we choose S = S* . Note that the matrix / + A is positive semidefinite 
with diagonal entries equal to 1. By applying Theorem 31 to the matrix 



choose S such that 







, we will 



J r (X T UiX)+Jr(X T U s X(I + A)) J^ T X(I + A)) 



A' 

^jJr(X T X) since Jr(X T XA) = 



A' 

where A' = (1 - e) min{A r+ i(I + A), 1} = (1 - e) min{2 - A„_ r _i(£), 1}. 
On the other hand, 

Jr(X T UjX) + Tr(X T U s X(I + A)) Tr(X T U^X) + Jr(X T U s X) + Jr(X T U s XA) 



Jr(X T X) Tr(X T X) 



Jr{X T X) + Jr(X T U s XA) 



Jr(X T X) 

Thus, for such a choice of S, we obtain that £ ^ jr — 1. Substituting this back into Equation (20), 
we have 

E[|/|]>— ^ — ^ . □ 

2c?max 1/A — 1 
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A Approximately Solving SDP 



In this section, we will show how to solve the SDPs arising from the Lasserre relaxations of prob- 
lems considered in this paper. 

Notation 38. For any matrix Y £ R A/xM , we will use Y to denote the "vectorization" ofY: Listing each 
entry of matrix Y linearly in some canonical order. For example, for any two matrix Y,Ze M. MxM 

Tr(YZ) = {Y, Z) = Y T Z. 

Theorem 39. Consider a QIP minimization problem on k-labels with PSD objective matrix, A £ ~R( Vx [ fc D x ( Vx M), 
A y of the form 

minimize x T Ax, 
at most (kn)°^ many linear inequalities {(bi, Ci)} { of the form 

(bi,x) ^ q, with \\bi\\ max ^ 1, 

a set of monomial constraints {(Tj,^, hi)} ir where Ti <^V,gi£ [k] Ti and hi £ {0, 1}, of the form 

[ x u (gi(u)) = hi, 8 

Then for any positive real eq > 0, there exists an algorithm to find a solution {xs(f)} v ^ f or the 
corresponding r-round Lasserre relaxation in time (kn)°^ ■ O (log(l/eo)) with the following properties: 

1. If we let X denote the matrix with columns corresponding to vectors x u (i) for all (u,i) £ V x [k], 
then 

^2 (x u (i), x v (j))A {U)i) ^ vJ) = Jr(X T XA) < Opt + e 

where Opt denotes the optimum value for the SDP formulation. 

2. It satisfies all Lasserre constraints exactly. In particular x £ Lasserre^ (V x [k]). 

3. For any S £ (< r ) and f £ [k] s , if we let X(s,f) denote the matrix with columns corresponding to 
vectors x SU f u \(f o i u ),for all (u, i) £ V x [k], then 

(a) For each linear inequality constraint (pi, q) 

bi{u,i){x s {f),x u {i)) = Jr(X {SJ) T X [SJ) dmg{bi)) ^ Ci\\x s (f)\\ 2 - e . 

u,ia[k\ 

(b) For each monomial equality constraint (Ti,gi,hi) provided that \S U Ti\ ^ r, 

\\xsuTi(f ° 9i)\\ 2 = hi \\x s (f)\\ 2 • 

8 For QIP problems, these are used to express boundary conditions. For independent set problem, such constraints 
are necessary to express independence constraints, which are of the form x u (l)x v (l) = 0. 
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Proof. We will try to find X G M. MxM r representing the Gram matrix associated with the Lasserre 
vectors. The rows and columns of X are associated with pairs (S, /) where S G ( <r ) and / G [k] s . 
For any S,T G (< r ), f G [k] s and g G [k] T , the corresponding entry of X, Xrsj)jT,g) represents 
the inner product (xs(/), £t(sO)- Note that if we let A = S U T and h = f o g, by Lasserre 
constraints, if f\snT = 9\snT then there exists a unique value Z(a,H) such that Xfs,f),(T,g) = z {A,h)> 
else ^(5,/),(T, ff ) = 0. 

Let Z be the affine space of vectors z with coordinates indexed by (A, h) where A G (<^ r ) and 
h G [A;]" 4 satisfying the constraints: 

1. z = 1, 

2. For any u £ V, z (Uyk) = 1 - Ej<fe-i ^(uj)' 

3. For given S and / G [k] s , for any (Tj, /ij), if ^ = 0, then z^ SxjTi j ogi) = 0, else if ^ = 1, then 

Let y(z) denote the matrix whose entry in row (SJ) and column (T,g) where 5, T G (J.), 
/ G [k] s and 5 G [/c] T 7 is given by z {A)h) where A = S U T and /i = / o g if /| SnT = 5r| 5nr , and 
otherwise. It is easy to check that if X = y(z) is positive semidefinite, then all Lasserre consistency 
constraints are met by any set of vectors whose Gram matrix equals X. 

We will also assume objective matrix A is extended to M. MxM by padding with 0's. Then for 
such a positive semidefinite matrix X = y(z) parametrized by z G Z: 

1. Objective function has the form Tr(AX). 

2. Each Lasserre and monomial constraint is always satisfied by construction. 

3. Corresponding to each S and / G [k] s , 

(a) Each linear inequality constraint (bi,x) ^ c; takes the form Jr(BiX) ^ 0. 

From now on we will use B to denote the matrix whose rows correspond to linear inequality 
constraints 3a over all S and /. Thus the SDP we need to solve can be represented as 

Find z minimizing Tr(AX), 
subject to BX ^ 0, 

x = y( z ) y 0. 

We can use the interior point method [NN06] to solve this problem within an accuracy of e' = 
£o||-B||min in time (kn)°^ ■ O(log(l/eo)) and find z with following properties: For X = y(z), 

• Jr(AX) sC Opt + e 0/ 

• x y 0, 

• BX ^ -e . 

• (by construction) X satisfies all monomial and Lasserre consistency constraints exactly 

□ 
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